Lab 5 report

0716206陳昱丞

0716221余忠旻

calculate the memory stall cycles:

Simulated program execution cycles為利用Lab所提供的assembly code，根據loop的執行方式，計算各個指令執行的次數，得出公式

Fig(a):  
 hit time cycle : send the address+access single cache content+send a word of data  
 =1+2+1=4  
  
 cache miss penalty : send the address+8\*(send the address+access memory  
 content+send a word of data+access single cache content)+access single cache  
 content+send a word of data  
 =1 + 8\*(1+100+1+2) + 2 + 1 = 836  
  
 total memory stall cycles : (hit time cycle\*cache hit number)+(cache miss penalty\*cache  
 miss number)

Fig(b):  
 hit time cycle : send the address+access single cache content+send a word of data  
 =1+2+1=4  
  
 cache miss penalty : send the address+(send the address+access memory  
 content+send a word of data+access single cache content)+access single cache  
 content+send a word of data  
 =1 + (1+100+1+2) + 1= 108  
  
 total memory stall cycles : (hit time cycle\*cache hit number)+(cache miss penalty\*cache  
 miss number)

Fig(c):  
 L1 hit time cycle : send the address+access L1 cache content+send a word of data  
 =1+1+1=3  
  
 L1 miss L2 hit time cycle : send the address+4\*(send the address+access L2 cache  
 content+send a word of data+access L1 cache content)+access L1 cache content+  
 send a word of data  
 =1 + 4\*(1+10+1+1) + 1 + 1= 55  
  
 L2 cache miss penalty : send the address + 32\*(send the address+access memory  
 content+send a word of data+access L2 cache content) + 4\*(send the address+  
 access L2 cache content+send a word of data+access L1 cache content) +access L1  
 cache content+send a word of data  
 =1 + 32\*(1+100+1+10)+ 4 \* (1+10+1+1)+ 1 + 1 = 3639  
 total memory stall cycles : (L1 hit time cycle \* L1 hit number)+(L1 miss L2 hit penalty \* L1  
 miss L2 hit number)+( L2 cache miss penalty \* L2 miss number)

Compare and discuss the difference among the three memory organization

Memory Stall Result : Fig(b)<Fig(a)<Fig(c) :

Fig(a) :

相較於Fig(b)，雖然cache sizec和block size相同，miss rate也相同，但是其word wide只有1，對於8-word block要八次memory access和八次的data transfer，因此其效率也較差

Fig(b)擁有最少memory stall 原因 :

Fig(b)的cache word wide為8，相較於(a)，memory一次可以讀取8 word，對於8-word block 只要一次memory access和一次的data transfer，因此stall cycle也較少

Fig(c)擁有最多memory stall 原因 :

Fig(c)的two level memory organization可以看出，L1的hit time確實有減少，L2的miss rate確實有減少，但是由於L2 的miss penalty過高，導致一旦L2 cache為miss，就會產生大量的memory stall，最終導致效率最差。